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Concordant computation is a circuit-based model of quantum computation for mixed states, that 
assumes that all correlations within the register are discord-free (i.e. the correlations are essentially 
classical) at every step of the computation. The question of whether concordant computation 
always admits efficient simulation by a classical computer was first considered by B. Eastin in 
quant-ph/1006.4402vl, where an answer in the affirmative was given for circuits consisting only of 
one- and two-qubit gates. Building on this work, we develop the theory of classical simulation of 
concordant computation. We present a new framework for understanding such computations, argue 
that a larger class of concordant computations admit efficient simulation, and provide alternative 
proofs for the main results of quant-ph/1006.4402vl with an emphasis on the exactness of simulation 
which is crucial for this model. We include detailed analysis of the arithmetic complexity for 
solving equations in the simulation, as well as extensions to larger gates and qudits. We explore 
the limitations of our approach, and discuss the challenges faced in developing efficient classical 
simulation algorithms for all concordant computations. 


I. INTRODUCTION 

Understanding the hardness of simulating quantum computation on a classical computer is a central question in 
quantum-computing theory. Efforts to address this question are important to help identify candidates for quantum 
algorithms which outperform their classical counterparts. An essential aspect is to identify classes of quantum al¬ 
gorithms which fail to admit any speed-up compared to their classical counterparts, since this can give us insights 
into the aspects of quantum mechanics that might be responsible for any quantum computational speedup. However, 
there are still relatively-few general results in this area. Often insight can be gained by the development of efficient 
simulation methods for certain families of quantum physical processes and quantum circuits. 

A prominent example is the role of entanglement in unitary circuit-based quantum computation over pure states 
Q,i- For this model, exponential speedup with respect to classical computation requires that certain measures 
of entanglement scale with problem size. This indicates that entanglement plays an important role in pure-state 
circuit quantum computation. However, entanglement-scaling on its own does not provide a sufficient condition for 
a computational speedup. For example, highly-entangling circuits using only gates from the Clifford group can be 
efficiently simulated via the Gottesman-Knill theorem |3(. To add further nuance, there exist models of universal 
quantum computation where certain entanglement measures may remain small and even tend to zero with growing 
computational size 0. These results demonstrate that the role played by entanglement in pure-state computations 
is a subtle one. 

Much less is known, on the other hand, about quantum computation over mixed states. For pure states, absence of 
entanglement (separability) implies a tensor-product state, and coincides with the absence of any correlation. However, 
for mixed states separability is a much weaker constraint than being uncorrelated, and the correlations in such states 
can exhibit both classical and non-classical correlations. A long-standing question is whether general unitary circuits 
acting on separable states can be efficiently simulated classically 0 (i.e. assuming that the register remains separable 
at every stage of the computation). Since classically-hard probability distributions can be sampled from simple 
quantum circuits Q and linear optical networks Q, quantum-generated states may be hard to classically simulate 
even in the absence of entanglement. Indeed, classical N-bit probability distributions require exponentially-many 
parameters for their descriptions, just like entangled pure states. 

A well-studied model of mixed-state computation is the DQC1 or “one-clean-qubit” model, which uses a (partially)- 
pure control qubit and a register of qubits prepared in the maximally-mixed state 0. In this model, the normalized 
trace of a unitary circuit may be well approximated by the average of measurements on the control at the output. The 
role of entanglement in DQC1 was studied in Ref. Q. It was found that entanglement in the output state, quantified 
using multiplicative negativity across bipartite cuts, becomes a vanishingly-small fraction of the maximum possible 
as the size of the register increases [§j. 

Later studies looked at the generation of discord in the output state of DQC1 Bi, looking specifically at the 
correlations between the control qubit and the entire register. For “typical” unitaries, defined as unitaries sampled 
using the Haar measure, it was found that discord remains a fixed fraction of the maximum as the number of register 
qubits increases 0. We remark, however, that the normalized trace of Haar-random unitaries converges to zero as the 
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size of the matrix gets large, since the corresponding eigenvalues are uniformly-distributed phases between 0 and 2-7 t. 
Hence the output of such DQC1 computations is known in this limit, and nothing can be concluded about algorithmic 
speedup. Ref. 0 also provided a condition for the generation of no discord at the output of DQC1. Nonetheless, 
the relationship between entanglement and discord in cases of the DQC1 model with apparent algorithmic speedup 
remains little understood. 

Important progress in the study of the role of correlations in mixed-state computation was made by Eastin in 
Ref. |ll|, where concordant computation was defined and first analysed. A concordant state, sometimes called a “fully- 
classical” state, is defined as a state which is diagonal in the computational basis up to local-unitary transformation. 
A concordant computation is one which satisfies the promise that the state of the system remains concordant after 
each unitary gate in the computation. Concordant states are closely related to classical probability distributions, 
their only non-classical attribute being the local-unitary freedom of their density-matrix eigenbasis. Thus they can 
be characterised via a probability distribution and local unitaries. 

Monte-Carlo simulation has been the basis for a number of methods for efficiently simulating physical processes 
and quantum circuits EMI. Using a Monte-Carlo method, Eastin presents a general procedure [11] for simulating 
concordant computations which samples the output statistics. He argues that it is an efficient algorithm on a classical 
computer when circuits are restricted to one- and two-qubit gates. However, there are examples of concordant 
computation that admit efficient simulation but which do not fall within Eastins’ results. 

Consider the following example of a concordant circuit for DQC1 using gates of unrestricted size: The ini¬ 
tial state, comprising a pure control qubit and N fully-mixed register qubits, is |-|-)(+| x ® (1/2^) l 2 -..iv+i (where 
|±) = (|0) ± |1)) / a/ 2) and the circuit consists of a series of controlled-gates G\---Gt which are Hermitian and 
diagonal in the computational basis, for which Gk ■ |0) 1 |x ) 2 ... N+1 H > |0) x |x) 2 , 1 and Gk : 11) x |x) 2 ... JV , 1 i—>■ 

(— l)^ fc ( x ) |1) 1 |x ) 2 ... N+1 (where /fc(x) takes values 0 or 1). The output state is then (1/2^) Ex|/(x)=o l+X+li ® 
l x Xx| 2 ...j V + 1 + (1/2 W ) Ex|/(x)=i |->Hi ® |x)(x| 2 ... iv+1 where /(x) = ELi h (x) (mod 2), and the expectation value 
for measurements on qubit 1 in the |±) basis is the average value for (— l)-^ x) over all bit strings x. This computation 
admits a straightforward Monte-Carlo simulation by sampling measurement outcomes for pure-state trajectories given 
input states |+) 1 |x) 2 ... Ar+1 (where x is a bitstring chosen uniformly at random). 

In this paper, we develop new technical tools to understand concordant computation, and use these to extend and 
refine Eastin’s results. In particular, we prove explicitly that our simulation is exact, an essential aspect of this model, 
since the concordance of a state is not preserved under arbitrarily-small perturbations [16] . 

We proceed as follows: Sec. El provides an informal introduction to concordant computation and some key ideas 
for simulating it. This includes general features of the states involved and specific requirements for simulating 
computations. Sec. urn revisits the central results of Ref. 0 using a new formalism. It provides self-contained 
proofs within our revised framework. Sec. IIVI presents a new simulation procedure that bypasses a bottleneck when 
identifying symmetries of the system state, which is a critical part of the procedure of Ref. [ll]. In Sec. IVlwe explain 
the limitations of our simulation procedure, before concluding in Sec. IVII with some discussion of the prospects of 
efficient simulations for all concordant computation. 


II. OVERVIEW 

In this section, we provide an overview of the techniques that are developed in the rest of this paper. We start with 
the following formal definition for a state to be concordant: 

Definition A state p, with N qudit subsystems labeled by j, of arbitrary dimension dj, is called concordant if every 

(j) 

qudit possesses a complete set of orthogonal rank-1 projectors 7r):\ such that 

P = P(k 1 ,k 2 ,--- ,k N )w { ^ ® ‘ ‘ ® 7r^ } (1) 

ki ,&2 >••• ,&IV 

where p(k\, • • • , fcjv) is a probability distribution. 

Concordant states have the interpretation of being the only quantum states having zero non-classical correlation with 
respect to any bipartition of the subsystems (see for example Refs. 0 ). (Non-classical correlations can be quantified 
using quantum discord [l8[ or a variety of related measures Ref. 0 .) The basic premise of the model called concordant 
computation is that an algorithm is supplied, consisting of a choice of a concordant initial state, unitary circuit, and 
measurements, such that the quantum state remains concordant after each gate acts. The generation (or “encoding”) 
of algorithms is not of concern here — only the simulation (“decoding”) of algorithms which satisfy the promise of 
concordant states between gates. 
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The most basic example of concordant computation is given by probabilistic classical computation using reversible 
gates — which amounts to concordant computation in the computational basis. The expectation values for observ¬ 
ables at the output can be evaluated efficiently by a Monte-Carlo method, which uses simulation trajectories on bit 
(dit) strings which are computed directly from the circuit given for the computation. More generally, concordant 
computations can involve entangling gates and changes to the local basis as the computation proceeds. To illustrate, 
if a CNOT gate acts on a concordant state p, for which both qubits on the support of the gate are in the |±) basis, 
then these basis elements are permuted and p remains concordant. However, if the control qubit is in the |±) basis 
and the target qubit is in the computational basis, then the gate maps basis elements to Bell states, and p may or may 
not remain concordant depending on its symmetries: Only if p is invariant under | + 0) o | — 0) and | + 1) -o- | — 1} 
is concordance preserved. 

More generally, the symmetries of the quantum state play a central role in concordant computation. We say that 
a unitary S with support on qudits & is a symmetry of state p if SpS t = p. The collection of all such unitaries (on 
b ) defines a symmetry (sub)group. For example, when p has fully non-degenerate eigenvalues, and is diagonal in the 
computational basis, the symmetries include the identity operators and any phase gate. To make the consequences 
of symmetry manifest, let us write concordant states in Eq. [T] in a different form. This form is related to the original 
definition of a concordant state as a state of zero discord - see Sec. IIII Al for more details. Given any concordant state 
p, and any partition of the qubits (or qudits) into subsets a and b, p can always be written 

P = E^ a) ® n i &) > ( 2 ) 

k 

where is a set of orthogonal projectors which are related to the computational basis by local-unitary operations, 
and the pC 1 are (unnormalised) density operators. After collecting terms in the sum for which p 1“^ = pffl, this 
decomposition of p becomes unique (as proved in Sec. IIII All and we call the in this case Full-Rank Subsystem 

Eigenprojectors (FRASEs). Any symmetry S of p with support on b then satisfies SH^ S f = Ilj,^ for the corresponding 
FRASEs. 

When the spectrum of p is fully non-degenerate, any set of gates which implements a concordant computation must 
map product states to product states at every step, and efficient Monte-Carlo trajectory simulation is (trivially) pos¬ 
sible. However, when concordant states have degenerate eigenvalues, the problem of classically simulating concordant 
computation becomes non-trivial and more interesting. The degeneracy allows the gates specified in the problem to 
generate trajectories which create entanglement but leave the state concordant. A computational basis representation 
of such a trajectory will require exponentially-growing resources. 

The degeneracy in the quantum state, and the symmetries which follow from it, therefore disrupt naive trajectory 
simulation. Fortunately, the degeneracy itself gives rise to a new way to construct trajectories by providing for families 
of equivalent unitary gates that lead to the same output state. We will say that gates G and G, with support on 
qudits &, are equivalent with respect to concordant state p, if GpG ^ = Gp&. It is easily verified that this last equation 
is also equivalent to the existence of a symmetry S of p on b satisfying G = GS. Hence, the challenge for efficient 
classical simulation of a concordant computation is to find circuits of equivalent gates which define trajectories with 
an efficient simulation (where the computational requirements of all steps in the procedure are accounted for). 

A key insight of Eastin in Ref. [11] is that if a gate G acting on subset of qudits b maps a concordant state to a 
concordant state, then it is equivalent to the following sequence of gate operations which also act only on b: a local 
unitary, a classical reversible gate, and a second local unitary. (By classical-reversible gate we mean a gate which 
permutes logical basis states, e.g. a CNOT, NOT or TOFFOLI gate.) The action of the two local unitaries is to 
first rotate the local basis of the state into the computational basis, and then rotate it to the new local basis for the 
state. If this set of alternative gates is known then a Monte-Carlo simulation will proceed via product states, and the 
classical simulation will be efficient. For any concordant computation this set of gates always exists, and the challenge 
is to efficiently compute it. 

One approach to finding this equivalent gate set would be to identify all symmetries of the state (on b) and then 
search over gates to identify those with the needed properties. However, the symmetry identification can involve 
exponentially-big matrices, since it involves an exhaustive search over the full state p, leading again to inefficient 
simulation. Furthermore, any attempt to find trajectories that relies on testing on the whole quantum state (e.g. 
direct application of the criteria for classicality given in Ref. 0) will typically suffer from exponentially-scaling 
overheads. As noted by Eastin, one can show that the identification of symmetries of an initially-uncorrelated state 
after a circuit of unitary gates (even a set of classical reversible gates) is NP hard in general. Furthermore it is widely 
believed that not even universal quantum computers can solve NP-complete problems in polynomial time. 

Nonetheless, Eastin argued that all necessary symmetry identification can be performed efficiently for concordant 
computations comprising circuits of one and two-qubit gates, together with some restriction on the form of the initial 
state (see Sec. IIII Bl for an alternative proof of this). Note that, although a universal gate set can be obtained using only 
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one-qubit and two-qubit gates [2ll| , it does not follow that concordant computations comprising three-and-higher qubit 
gates always (efficiently) decompose into concordant computations with one-qubit and two-qubit gates. Furthermore, 
the question of when qudit-based concordant computation for d > 2 (qudits) admits efficient classical simulation has 
remained entirely open so far. 

In this paper, we develop a new approach to simulating concordant computation which allows us, in many cases, to 
go beyond the limitations discussed above. This will be described in detail in Sec. HVIand Sec. El The key new idea 
is that it is often not necessary to acquire full knowledge of the symmetries of the quantum state in order to identify 
classically-efficiently-simulable trajectories. In practice, the strict requirements upon the gate to leave the state 
concordant will often allow suitable trajectories to be extracted from individual quantum gates alone, side-stepping 
the NP-hard bottleneck in Eastin’s algorithm with a tractable analysis on individual unitary gates. 

The heart of our algorithm is a sub-routine that we call the Local-Basis Finder (LBF). In Sec. IIV A1 we provide a 
detailed analysis of the LBF, which aims to identify the local basis rotation which forms the first of the three unitary 
gates (local rotation, classical reversible gate, local rotation) which act equivalently to the unitary gate applied in 
the circuit. Knowing this local rotation, the other two gates needed for the efficient trajectory simulation can be 
efficiently derived. 

We emphasise that a critical issue for the simulation of concordant computation is the effect of numerical errors, 
such as rounding errors. The set of concordant states has zero volume relative to the Hilbert space of all quantum 
states. Small perturbations on concordant states will generate non-classical correlations (discord) [h|, and necessarily 
disrupt the state symmetries which Eastin’s algorithm computes at every step. Thus simulation algorithms of this 
type have no tolerance to such errors, and they must therefore proceed via exact numerical calculation. We remark 
that even the new algorithms introduced in this paper, where computing state symmetries is not always necessarily, 
require an exact representation of the local basis of the state for their successful implementation. 

While Eastin did not consider this issue in El , we show that his approach can be adapted to exact arithmetic while 
remaining efficient. The adoption of exact arithmetic is a non-trivial and a key technical contribution of our work. 
Exact simulation means that one cannot use the approximate floating-point arithmetic typically used to approximate 
real or complex numbers in Physics simulations. We achieve this by adopting a combination of exact integer arithmetic 
and exact arithmetic on algebraic numbers. While the latter is not typically efficient [22, [23j, we show in Sec. IIV Bl 
that this computational cost is a fixed overhead and does not affect the scaling of our algorithm (and our exact version 
of Eastin’s algorithm). 


III. STRUCTURE OF CONCORDANT STATES AND EQUIVALENT CIRCUITS FOR SIMULATING 

CONCORDANT COMPUTATION 

In this section, we revisit the key results of Ref. El , using an alternative argument with some new techniques that 
clarify how the approach works. In Sec. IIII Al we introduce new tools for understanding a notion of degeneracy which 
plays a central role in Ref. El and in the current work. Then, in Sec. IIII 51 we formally derive the general method for 
trajectory-based simulation of concordant computation using these tools. We also review the difficulties encountered 
in Ref. [111 ] which centre around a step in the simulation algorithm — termed “Diagnosing the degeneracy” — which 
attempts to identity symmetries of the system state. 


A. Subsystem-eigenprojector decomposition for quantum-classical and concordant states 


We begin by introducing a key notion of classicality relevant for concordant computation. A state p , with subsystems 
labeled a and b, is said to be classical with respect to b if there exists a complete set of rank-one projectors {tt^} on 
b such that p = J2k 7r k’' > P 7T< k' > ’ or equivalently p = J2 k PkP^ ® 7r [ & ' > where {pk} is a probability distribution. A state 
of this form is sometimes referred to as a quantum-classical state , and the p\^ are sometimes denoted conditional 
density matrices fl9| . 

Consider the unnormalised conditional density matrix, 


-(a) (a) 

Pk =PkP\ k ■ 


(3) 


This operator satisfies an equation reminiscent of an eigenvalue equation, 


( b ) ~(a) 

pK -pi 


‘fc ! 


(4) 
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with pj. 0 "* playing the role of the eigenvalue and the role of the eigenprojector. We shall see that in fact these opera¬ 
tors do satisfy many of the properties of eigenvalues and eigenvectors respectively, and thereby provide a generalisation 
of them. 


Definition A subsystem operator-valued eigenvalue (SOVE) pand corresponding subsystem eigenprojector (SE) 
7i -( b ' > is any pair of such operators that satisfy, 

p (a (a) ® 4 b) ) = (i (a) ® 4 b) ) p = Pk ] ® 4 b) , (5) 

where is assumed have rank one, p^ has support solely on a and has support solely on b. 


For a state p which is classical with respect to sub-system b, a SOVE can be computed from any corresponding SE 
via the equation p^ = Trj, (1 0 n^)p . 


Lemma III.l Suppose that p^ and p^ are SOVEs with corresponding SEs ir^ and n r^. If the SOVEs are distinct 
then the SEs must be orthogonal. 


Proof The proof is identical to a well-known proof of the orthogonality of eigenprojectors with distinct eigenvalues, 
and we include it for completeness: We have p ^1^“) 0 7rJ b ^ = p ^ 0 7r^ and ^1^ 0 p = p^ 0 Hence, 

p^ 0Tr = p^ 0 Tr but if Pi^ ^ P 2 , then the only solution to this equation is Tr 7rJ b ^ = 0. 


For any given quantum-classical state p = J2k 7r k > ' > P 7T k > ' > > the SOVEs can be degenerate , i.e. there can be two (or 
more) rank-one projectors tt^' 1 and such that PkP^ k = Pk/ P\^■ Note however that the decomposition of p here 
has a form reminiscent of a spectral decomposition of a Hermitian operator into eigenvalues and eigenprojectors. An 
elementary result is that sets of orthogonal rank-one eigenprojectors of Hermitian operators are not unique when 
the spectrum includes degenerate eigenvalues, and that uniqueness is recovered when rank-one eigenprojectors are 
combined into full-rank eigenprojectors, corresponding to maximal subsets of rank-one eigenprojectors for distinct 
eigenvalues. 


Remark For any finite-dimensional Hermitian operator p, there is a unique set of full-rank projectors n*, such that, 
p = Efc Tr (pfffc) life, which also satisfy J2 k n fc = 1 and n fcP = P n fc = Tr (P n fc) n fc- 

Here the uniqueness follows from the full-rank property, and the orthogonality of the eigenprojectors associated with 
different eigenvalues. Alternatively, it follows directly as a corollory of Lemma 1111.21 below. 

Proceeding now by analogy, we make the following definitions: 


Definition Let p be quantum-classical state with respect to a bipartition into subsystems a and b. We define a 
Full-Rank Subsystem Eigenprojector (FRASE) for p to be any SE nl b \ with rank > 1, for which there does not exist 


any on b such that p 10 

Then we call a decomposition of the form, 


~(a) 

= P k 


(n^+TrW), where p£° = Tr b [(l0nf) p] /Tr h (njj?). 


E ~(a) 

Pk 

k 


>n 


(*>) 


( 6 ) 


a FRASE decomposition , where are orthogonal FRASEs satisfying H^'* = 1, and every p k ^ is a distinct 
Hermitian operator on a. 

It is now straightforward to prove that FRASE decompositions can be made along similar lines to expansions of 
Hermitian operators in their eigenvalues and full-rank eigenprojectors: 

Lemma III.2 Every p which is quantum-classical with respect to a bipartition into subsystems a and b, possesses a 
unique FRASE decomposition as given by Eq. (SJ). 


Proof The existence of a FRASE decomposition for any quantum-classical state follows directly from the definition 
for these states given above (by combining SEs for degenereate SOVEs). The uniqueness follows immediately from the 
fact that each FRASE is a full-rank projector onto a subspace, and a full-rank projector onto a sub-space is unique. 
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FRASEs satisfy many similar properties to full-rank eigenprojectors, and the standard definition of eigenprojectors is 
recovered as b is extended to the whole system. The uniqueness of FRASE decomposition underpins the simulation 
methods in this paper. 

Now Lemma llII.2l above gives rise to the following corollary for concordant states, which provides a useful uniqueness 
argument which will be needed later on: 

Corollary III.3 Any subset b of qudits in a concordant state p = ^ Lp(x)|x)(x|lA, where x = ■ ■ ■} 

labels the computational basis and L = LL^'" denotes local-unitary rotations for every qudit, has a unique set 
of FRASEs {n[ 6 '} which possess a product basis (that is to say the FRASEs are a sum of orthogonal product states). 

Proof The restrictions of the components of p to the subset of qudits in b , L^ |x^ b ^)(x^ b) provide an orthogonal 

set of subsystem eigenprojectors from which a unique set of FRASEs can be constructed following Lemma IIII.2I 

It is important to note that the local basis for b itself may not be unique, as is the case for example when the 
subsystem-eigenprojector decomposition yields one FRASE which is the maximally-mixed state. 


B. Monte-Carlo simulation of concordant computation 

Now we formalize the central challenge tackled in this paper and Ref. Our notation is as follows: We are 

given unitary circuit G, consisting of unitary gates Gt for the t th time step, on a system of N qudits (with arbi¬ 
trary dimensions). The partially-completed unitary after the f th step is Ut = TTl t k _ 1 Gk (where T denotes that the 
product respects the temporal ordering of the unitaries). We are also given po which must admit a polynomially- 
sized description, and the promise that the state of the system at every step, p t = UtPoU\ (where po is of the form 
Pt = '^2 x pt(x)L t \'x.)(x\Ll and where Lq and po only are given as part of the specification of the problem). Then the 
overarching goal can be stated as: Find an equivalent circuit C", consisting of unitary gates G' t with partial comple¬ 
tion of the circuit U( = TII^G^, such that U t poU) = U(.poU(\ but for which there are corresponding pure-state 
trajectories which are known to be efficiently simulable. 

Using the results of Sec. IIII A1 we can now proceed to derive the general form of a circuit suitable for simulating 
C. At time step t, Gt defines a bipartition of the system qudits into its support b and the rest a. The system state 
after t — 1 has form 


E ~(< 

Pk 


;(“) 


>n 


(b) 


(7) 


® nj: 6) ) p t - 1 / Tr& and jllj^ j is the unique set of FRASEs following Corollary IIII. 31 

i-i-v-rV ,n 1 T ^) I v (b} \/„(b) I T (b)t 111/... nt f lio ..r.U .-.f lime uton t 


where p^ = Tr;, 

which are sums of orthogonal product states |x^ b ^)(x^ b )| L^)\. Then, at the end of time step t 

t 


Pt — G t pt-iG t 




(a) 


'GtH^Gl 


( 8 ) 


x( b ))<x( b )| 4 b)t by 


By inspection, the operators G t II^G| are subsystem eigenprojectors of p t for the same bipartition. Since the 
are FRASEs for pt-i, and G* is unitary, it follows that the GJl^G^ are FRASEs for p t . Furthermore since 

p t is concordant, the operators G t nj,^G tt must be a sum of orthogonal product states 
Corollarv lIII.3l The uniqueness property of FRASEs now gives us the following key equation: 

GtR^G^A^AA^In^A^^Af^ Vfc, (9) 

where D t accounts for the possibility of a permutation of the computational-basis states on b. (Note that there is 
typically freedom in choices for D t .) 

For the system state we have the equivalence, 

t 


Pt — G t pt.-iG t 

= L^D t L^l\p t - 1 L^l 1 D\Lf ) \ 


(10) 
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From this equation it should be noted that L t agrees with L t -i other than (possibly) on the b, and that p t (x) = 
Pt-i ({y, Df 1 (z)}^ (where z is the part of x in b , y is the part of x not in b and we write D t |z) = \D t (z)}). For C as 
a whole we have, 

Pt = U t poUl 

= {TnU 1 (L k D k Ll_ 1 )}p 0 {ruU 1 (L k D k Ll_ 1 )}^ 

= L t {Tnl =1 D k }Llp 0 L 0 {rUl =1 D k yLl ( 11 ) 

To find a complete simulation algorithm for concordant computation, the challenge now is to find the gates L k 
and D k (for all time steps) which make up C", given the initial state and C. We will return to this challenge shortly 
however, and consider how the output statistics would be simulated supposing, for argument’s sake, that C' has 
already been found. First, we observe that a simulation algorithm does not need to compute an explicit description 
of the full state at every time step, but can instead just record changes to the system state using an update rule in 
keeping with Eq. m- A suitable update rule is: (i) record a new local basis, for every qudit in the support of the 
gate which acts, specified by a complete set of rank-1 projectors (which need not be unique); (ii) record a suitable 
permutation operator which acts on the support of the gate. Given such an update a rule, the output statistics of C' 
would be sampled as follows: 

Remark Monte-Carlo simulation, using a pre-calculated update rule, of the output statistics for a given concordant 
computation with initial state po = &jli Lq'* ^Po J \o)|0)(0|+Pq J \i)|1)(1|-1- 

• Define start of a stochastic trajectory, (in the computational basis), by randomly picking a iV-digit qudit string 
sin according to probability distributions Po^(-)>''' jPq N \')- 

t f 

• Permute Si n K > s ou t using TIi k=1 D k given by the update rule. 

• Sample probabilities l s i™t^X s out ^on specified qudits rrij with measurement bases 

This is equivalent to stochastically flipping some of the bits of s ou t with probabilities defined by the 
measurement basis. 

• Repeat, and gather statistics. 

Note that the local-basis changes at intermediate steps are not required to define the trajectories—only the basis 
of the final measurements. However, the intermediate local-basis projectors do play an essential role elsewhere for 
deriving the update rule from the initial specification of the concordant computation (as tackled in Sec. ED. 

The simulation algorithm described in Ref. 0 outlines a procedure for finding C’ from the specification of a 
concordant computation. For each time step f, the simulation algorithm works in three stages: The first stage is 
equivalent to finding the decomposition for p t -1 in Eq. Q (termed “diagnosing the degeneracy” in the reference), 
and is done by testing for the (permutation) symmetries of LqPqLq on the support of the partially-completed circuit 
Ut- The second and third stages were not described in detail in the reference, but are loosely equivalent to solving 
our Eq. (ED for and then D t . The first stage however constitutes an NP-hard problem in general (as explained in 
Ref. llj), which undermines the success of the simulation algorithm. 

One way around this is to place a restriction on C, so that po only needs to be tested for a limited group of 
symmetries. Ref. 0 made the stipulation that each G k should have support on one or two qubits only, as in this 
case the test can be limited to the set of classical-reversible gates which are linear [2ll [2(1 . Ref. 0 includes an 
efficient symmetry test for this case. Eastin’s proof of efficiency is not written using standard quantum information 
techniques. To aid the reader, therefore, we present an alternative formulation of this result here. We present a 
theorem and corollary, of which the latter is equivalent to Lemma 4 in 0- 

As we show below, the efficiency of Eastin’s test can be attributed to the fact that one- and two-qubit reversible 
classical gates (CNOT gates, NOT gates, and combinations) are in the Clifford group. After stating the more general 
Theorem IIII.4I we then derive Corollary 1 11 equivalent to Lemma 4 in 0. 


Theorem III.4 Suppose that LqPqLq = ^o^Po^o^ as above, which is factorised and diagonal in the 

computational basis, and that S a is Clifford unitary on the N qubits. Then: ScLqPoLqS}, = LqPqLq if and only 
if Tr (^ScL^pqLqS^ — L^paL^j Z^) g) jfA- 7 ') = 0 Vj. The expectation values Tr (^ScLqPqLqSI^ Z^> ® and 

Tr[(4p 0 Ao) ZW®lM can be computed efficiently, and the overall computational complexity for evaluating all 
the required expectation values scales quadratically with N. 












Proof See Appendix [A] 

Corollary III.5 FRASES can be computed efficiently for every step in a concordant computation on N qubits, for 
which the circuit C is composed entirely of one and two-qubit gates, and LqPqLq is factorised and diagonal in the 
computational basis. 

Proof At time step t, it is necessary to find the FRASES for p t -\ on the support b of Gt , and is equivalent to finding 
the full symmetry group of pt~\ on b. The promise of concordant computation implies that pt-i = pG 

where the Ilj^ are FRASEs, and it is required to find the projectors L t -\ in the computational basis (L t _i 

is known from the previous time step). This can be done by finding all classical reversible gates P on b satisfying, 

(D t - 1 • • • Di )* P (D t - 1 • ■ ■ Df) (l\pqLq^ ( D\ ■ ■ • D t -i)^ P (D± ■ ■ ■ D t ~ i) = l\pqLq. Under the restriction to one and 

two-qubit gates, all of the classical reversible gates D i, ■ ■ • , D t and P are Clifford gates (since they can be generated 
using only NOT and CNOT gates). Theorem IHI.dl can therefore be applied here, and it guarantees that all necessary 
symmetry tests can be performed efficiently. 


IV. NEW SIMULATION ALGORITHM FOR CONCORDANT COMPUTATION WITH GATES OF 

ARBITRARY SIZE 

The aim of this section is to develop an algorithm which can be used to simulate concordant computations which 
involve gates of arbitrary size. We continue using the notation introduced in Sec. IIII Bl for states, gates and circuits. 
To recap from Ref. Q the algorithm therein attempts to identity a FRASE decomposition at every time step (for 
the initial state po). As stated above, deriving the FRASE decomposition by considering the whole history of the 
computation and the symmetries of the input state is NP-hard, equivalent to testing satisfiability for an arbitrary 
Boolean function. 

The NP-hardness is avoided in Ref. [11] by restricting the simulation to concordant circuits composed of gates with 
support on only one or two qubits. Reversible one- and two-qubit gates are all in the Clifford group, and the efficiency 
of this algorithm is stated as Corollorv lIII.51 Here we develop an alternative approach. 

The requirement of concordance places strong conditions on every quantum gate. In particular, the FRASE de¬ 
composition for the state after a gate G can often be derived from the properties of G alone. Here we develop an 
algorithm which exploits this. The algorithm produces output equivalent to the one in Ref. fill ], namely a sequence 
of permutation gates D t and local-projector changes i —> L[ b ^\x.^)(x.^\L [ b ^, which are used to 

construct Monte-Carlo trajectories as described in Sec. IIII Bl 

In Sec. IIV Al we first derive the general structure of any gate which satisfies the promise of having concordant states 
for the input and output, and then we will present a general method for solving for its projectors {L^lx^Xx^ 
and D t from the given unitary Gt (and previously derived {L t _i|x^ b ^)(x^^jL^L-jJ). In Sec. HVBl we explain how each 
of the steps in the LBF can be implemented using exact arithmetic, and review the computational resources required. 


A. Local-basis-update equations and method of solution 


We start with a technical remark necessary to characterize all gates occurring in concordant computations: 

Remark Suppose that B is a partitioning of the labels of the computational-basis states for qudits in b. A unitary B 
is block diagonal with respect to the partitioning B of the computational basis of b , which is to say that the matrix 
representation is block-diagonal up to (identical) reordering of the rows and columns, if and only if B commutes with 
all projectors Xj = ^Cxe/h l x X x l ft e 

Proof 


( X | [B,Xj] |x'> 


{ 0 if x, x 7 e pj , 

0 if x,x 7 Pj, 

(x|R|x 7 ) if x £ Pj , x' e pj , 
-( x |R| x7 ) if X e Pj, x 7 Pj. 


Following the line of argument in Sec. IIII 51 every gate specified in a given concordant computation can be decomposed 
as follows: 
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Lemma IV. 1 Each gate Gt specified for concordant circuit C at time step t can be decomposed as 


G t = L[ b) D t B t L 


Ab) t 


( 12 ) 


where L^\ is the local unitary on the support b of Gt after all previous time steps, D t is a classical-reversible gate, 
and B t is block diagonal, being a direct sum of components which act identically on the projectors Xk = L[ b 2\lPi^ L[ b } 1 , 
where {n^} is the set of FRASEs for pt-i- 

Proof This follows immediately from Eq. [9] writing 


D[L t 


( 6 )t G t L^ 


t—i 




i) = (L^n^-r) (olL^GtL^) Vfc 


using the Remark above. 


The LBF exploits the guarantee of a decomposition of Gt as by Eq. (1121) . to solve for {L^lx^Xx^lL^} and 
D t . To do this, it runs over all possible computational-basis projectors Xk (of arbitrary rank) on the support b of 
Gt, and attempts to recover rank-one (pure) local-basis projectors by solving the following set of non-linear equations 
(for each qudit j in b ) to find unknown local basis projectors p®, 


l (fe/j) ® pW), GtL^XkL^Gl] = 0 (i) 

subject to, 

Tr (pW) = Tr (pW 2 ) = Tr (pW 3 ) = 1 (ii) 


(13) 


Solutions to these equations will be sums of local basis projectors that are mapped to local projectors by G. General 
solutions to (i), for each qudit, can be arbitrary linear combinations of the desired local projectors, and the constraints 
(ii) are required to solve for solutions that correspond to pure basis states. Specifically, constraints (ii) impose purity 
on general Hermitian operators. For qubits only the conditions on Tr(ptb) and Tr ( 2 ) are required, whilst the 
additional condition on Tr (p 3 ) is required when the dimension is greater than two)27|. 

Our method for solving Eq. ed is as follows: First Gaussian elimination is used to find a general Hermitian 
solution pW for Eq. (H(i). A random instance pw of p^ typically has support on the same local-basis projectors 
as Hence to derive rank-one solutions of Eq. (fTSl) f ij and (ii), a random choice is made for the eigenvalues 
of p® are found from its characteristic polynomial, and the corresponding eigenvector projectors are derived by back 
substitution into the eigenvector equation. (The process can be repeated to address rare cases where a bad choice is 
made for p^.) For each Xk, there are three types of solution to Eq. (ED: 

1. a complete local basis cannot be found for every qudit of b (in which case I-P‘\ XuL'^l does not correspond to 
a valid input FRASE). 

2. a complete local basis of unique rank-one projectors is found for each qudit of b. 

3. a complete local basis of rank-one projectors is found for each qudit of b , where at least some of the projectors 
are not unique. (This occurs when local-basis projectors are combined in GtLp\XkLpP[G\ and there is an 
infinite family of solutions.) 


Note that it is only the rank-one projectors, and not the local unitaries L t , that are needed for the simulation algorithm. 
In general the LBF finds a complete local basis only for a subset of the Xk, and we call this set x- The occurrence 
of non-unique local projectors could potentially cause difficulties when comparing results for different choices of Xk- 
For example, when Xk = 1 any complete local basis on & is a solution to Eq. ED- To address this, we define the 
Xk-unique local basis (for each qubit of b) as the unique combinations of rank-one projectors of minimal rank which 
are common to all possible local-basis solutions of Eq. (fl3l) . In other words, the projectors in a AVunique local basis 
are the smallest local-basis projectors which are uniquely determined by Eq. ED- We denote the A^-unique local 


basis projectors by <g) p]P, and they are easily found - for example by repeatedly solving for the local-basis. We 

denote by LBk the full set of Xk-unique local-basis projectors whenever it is defined. 

To implement a Monte-Carlo simulation of the concordant computation, as described in Sec. IIIIB1 there must be 
an unambiguous update rule for every time step. An unambiguous update rule can be obtained for time step t, only 
if the LBF is able to identify a unique set of local-basis projectors compatible with all LBk for Xk £ X■ For this to 
be possible, it is necessary that the elements of each LBk are common projector solutions for all Xk> £ X — that is 


to say that 


1 {b/j) ®fti ) ,G t Ll b 2 1 Xk'L[ b 2lGt 


= 0 V (g| p(j>> g LBk, Xk ' £ x — anc t the LBF must test all these 


conditions. When these conditions are met we term the LBk compatible, and the following lemma can be applied: 
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(i) Pseudo-code for LBF 

1. INPUT Gate Gt with support on qudits in b and the set of rank-one projectors L[ b } 1 |x^)(x^|Lj^ 

2. REPEAT For every projector X k in the computational basis 

SUBROUTINE: SOLVE for local-basis projectors of G t L t - 1 X k L\_ 1 G\ (see (ii) below). 

IF complete local-basis solution THEN RECORD AV-unique basis in LBt and fc in EX I ST S — LI ST 

3. REPEAT for every k, k' in EXISTS-LIST 

IF all projectors in LBk commute with GtLt-\Xy l\_ x g\ DO NOTHING 
ELSE RECORD “Incompatible solutions” 

4. IF “Incompatible solutions” THEN OUTPUT “Local-basis ambiguity at time step t” and STOP 
ELSE 

FIND Complete rank-one local-basis projector set compatible with LBk for all k in EXISTS — LI ST 
RECORD rank-one projectors in LBt, and Bt = {/3j | combining over x® € fij defines unique projectors} 
FIND (any) Dt consistent with GtLt-iXjLj_ 1 Gj = LtDtXjD\L\ Mj, where Xj = e /3 ■ l x ^)( x ^| 

5. OUTPUT LB t and D t 


(ii) Pseudo-code for solving local-basis equations 

1. REPEAT for each qudit j in (fa) and for confirmation of the solution 

SET p-V oc col + c/cr; f° r Hermitian basis matrices at and real expansion coefficients ci 
SET m = [auGtLt-^XkLl^Gl] 

SET linear-equation system S as matrix equation ^ cmi = 0 
APPLY gaussian elimination on E for general Hermitian solution for p 1 ' 1 ' 1 
PICK random instance of pV) by random choices for free ci 
APPLY root solver on characteristic equation of fP‘ 

APPLY back substitution into eigenvalue equation for rank-one eigenprojectors Lt|x^)(x^|Lf of p*- 6 -* 

2. IF complete local basis solution THEN FIND corresponding Afc-unique basis following Lemma llV.21 

FIG. 1. (i) Pseudo-code for LBF: uses promise of existence of a decomposition of gate Gt (following Lemma flV.il) . to find local 
projectors {Lt|x®)(x®|} and classical-reversible gate Dt on the support b of Gt- The pseudo-code heralds cases where Gt is 
consistent with multiple incompatible (complete) sets of local projectors, (ii) Pseudo-code for subroutine for solving nonlinear 
Eqs. (1131) for the local projectors. The subroutine can be repeated to exclude the possibility of pathological choices for p^h 
When a complete local basis is found for all qudits in b, the subroutine returns the A^-unique version of it; one way to find the 
AT-unique basis is by using the promise of projectors with integer matrices (see Sec. II V Bl) . 


Lemma IV.2 When the LB k are compatible \/X k G x? a complete set of rank-one local-basis projectors solutions can 
be constructed, |L t |x( b ))(x( b )|Lf | j, for which L t |x( b ))(x^ b ^|L| , GtLf^XkL^^G] = 0 Vx^,Aj. G x- 

j non-zero projectors ® [pu}pul ■ ■ • Pu\ x ^j |Vl^ b/,J ^ ® Pul G LB’ t is a complete 


Proof The set LB', = 


local-basis projector set, and represents a fine-graining of all the LBk , and U <h ^'^ 0 plf 1 , G t L^\Xk'L^}\ g\ = 0 

VI (b/j) ®pl i} G LB' t , X k> G X- To find a complete set of rank-one local-basis projectors solutions, the projectors p 


U) 


where 1 £ LB t can be decomposed into rank-one projectors. The vectors that correspond to these rank-one 


projectors can be obtained by orthogonalising the set of column-vector entries of p 
finally the projectors can be renormalised to have trace 1. 


U) 


using a Gram-Schmidt process; 


A summary of the key steps of our LBF routine in pseudo code is given in Fig. [TJ The possibility and implications of 
gates having multiple inconsistent local-basis solutions is taken up in Sec. |V] 


B. Implementing the LBF using exact arithmetic 


Errors in the local projectors from time step t — 1 in our simulation algorithm, I/^-Jx^Xx^lL^j, can cause a 

failure to find a complete local basis for GtL^\XkL^}\G\ for time step t, even though one must exist for the error- 
free case by the promise of concordant computation. The simulation algorithm proposed in Sec. II VI A has no way of 












11 


detecting and correcting errors, and they must be prevented from occurring. To address this issue, we look in detail at 
implementation of our simulation algorithm using integer arithmetic. Important goals here are to avoid unreasonable 
restrictions on the form of the concordant computations which can be simulated using our algorithm, and to ensure 
that the LBF does not incur excessive demands on computational resources, which should scale polynomially with 
the number of time steps with reasonable constraints on number size and memory. We permit irrational numbers 
in our simulation algorithm when they can be handled using (integer-based) exact arithmetic, and we have found it 
necessary to involve computations on (irrational) algebraic numbers for some intermediate steps. 

First we modify the definitions of concordant states and concordant computation used thus far. We call a gate 
or projector rational if it its matrix representation in the computational basis has only rational entries. Aug¬ 
menting the definition of a concordant state given in Sec. IIII A1 we define a concordant state p to be rationally 

fj) 

concordant if every subsystem possesses a complete set of rational orthogonal rank-1 projectors n k ' such that 

p = k2 ... p(ki, & 2 , • • •) 7r^ (g) 7 ■ ■ ■ , and p(k±, k^, • • •) is a rational probability distribution. Then we can de¬ 
fine a rationally-concordant computation as a concordant computation for which the system states are also rationally 
concordant at every time step, and in addition the projectors and probability distributions defining the initial state 
are rational, as are the gates Gt for all time steps. As an aside, we point out that our choice to use projectors to 
represent local bases in our simulation algorithm, (rather than the matrices themselves), prevents many standard 
gates and states from being excluded by the definition of rationally-concordant computation here. Our aim is to 
involve local rotations which are proportional to (complex)-integer matrices but (generically) have irrational (surd) 
normalisation factors, such as the Hadamard gate. For projectors defined using (complex) integer or rational entries, 
normalization proceeds by dividing out the trace, and surds are avoided. However a gate such as the n/8 gate, which is 

( o (l +i)/y/ 2 j ■ mus t be excluded whenever it would generate surd factors between entries of a local projector occurring 
in the simulation. 

Lemma IV.3 An implementation of the LBF described in Sec. I IV A\ using exact arithmetic finds a complete set of 
rational rank-one projectors for |x( b ))(x( b ) | given rational Gt and rational rank-1 local basis projectors 

|x( b ))(x( b )| provided all possible local basis solutions for the gate are compatible. Computations using 

algebraic numbers can be required at intermediate steps. 

Proof Part (i) is for the LBF applied to a single projector, GtL t -\XkL\_ 1 G\. Part (ii) is for finding LB t from the 
LBk when they are compatible, and for resolving higher-rank projectors into rational rank-1 projectors. 

(i) We refer to Fig. |TJii) for the steps involved in solving for X^-unique local basis projectors, and we give an 
implementation for them using integer computations: By making integer choices for the Hermitian basis matrices eq, 
an integer system of equations 3 can be obtained (for a specific qudit j). A Gaussian elimination method can be 
applied to 3 to solve for the general integer Hermitian solution p <JI . More specifically, the Hermite normal form for 
3, (an analogue of reduced echelon form for matrices over the integers), can be obtained in polynomial time using 
Bareiss’s algorithm, without suffering an exponential blowup in the memory requirements (see chapter 10 of Ref. [22j] ). 
A random choice for (Hermitian) p^ can be made which is an integer matrix (by integer choices of the free variables 
post Gaussian elimination). 

The characteristic equation for the eigenvalues of f/ 3> is then a (real) monic polynomial with integer coefficients, 
and its solutions must be real. The elementary rational root test for polynomials dictates that the roots are either 
integer or irrational algebraic numbers. Both integer and irrational roots play an essential role for finding local- 
basis projectors. Hence we note that exact arithmetic operations can be performed on algebraic numbers using only 
integer/rational computations. This can be done by manipulations of polynomials defining the algebraic numbers, for 
example using an encoding for which the polynomials are represented by companion matrices and held operations are 
performed using matrix manipulations (see Ref. [23,] for an introductory discussion on this). 

A method based on Sturm’s theorem can be used to find the eigenvalues of p^ (for a treatment of Sturm’s theorem 
see chapter 7 of SI). This theorem can be applied to the characteristic equation for p^ to find the number of 
distinct roots in any arbitrary interval (Ji,/ 2 ], by using a Sturm sequence for the characteristic polynomial. More 
specifically, the number of roots in the interval is given by the difference in the number of sign changes for the values 
of Sturm sequence when evaluated at Ii and I 2 ■ The eigenvalues of p^ can be found by a simple search method 
which repeatedly bisects a starting interval, at each step selecting one half interval which contains at least one root. 
This search method finds the eigenvalues exactly when they are integer, and it generates an isolating interval when 
the eigenvalues are irrational. 

Once the eigenvalues of p^ have been found, back substitution is used to find the rank-1 projector solutions. These 
solutions must be renormalised to have trace value 1. Integer eigenvalues lead to rational eigenprojectors, which are 
already part of the required X&—unique local-basis solution. The existence of rational local-basis projector solutions 
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with rank greater leads to eigenvalues which are irrational, and the associated rank-1 eigenprojectors must also contain 
irrational numbers. It is necessary to test combinations of these rank-1 eigenprojectors to find higher-rank projectors 
which are rational overall. The minimal-rank rational projectors formed this way must be added to the A^-unique 
local-basis solution. The promise of rational concordant computation guarantees that a complete rational local basis 
can be found for at least one Xf. 

(ii)When the LB *. are compatible, we can find a fined-grained complete local-basis projector set with elements of 
rank> 1 following the Proof of Lemma llV.21 above. It is necessary to verify that higher-rank rational projectors can 
be decomposed into rank-one projectors which are also rational. For this we employ a modified form of Gram-Schmidt 
as follows: Let vi,V 2 ,- ■ • be the column vectors of projector pu\ The vectors v[,v 2 ,- ■ ■ defined as, 


v i = Vi 

V 2 = { v l, v l)v2— (l’2, v'i)*v[ 

V 3 = (V2,v2)(v' 1 ,v , 1 )v 3 -(v 3 ,v , 2 )*{v[,v' 1 )v' 2 -(v 3 ,v' 1 )*(v , 2 ,v , 2 )v , 1 

v 4 = <^3 = ^3 ) ( ^2 ^2 ) () ^4 — (^4 , ^3 ) * (^2 = ^2 ) (^1 = ~ (^4 , ^2 ) * <^ 3 » ^ 3 > < V 1 = > ^2 ~ (^ 4 , > * (^3 > <^ 2 . ^ 2 > 

etc. 


are rational and orthogonal. The rank-one rational projectors \vi)(vi\/Tr(\vi)(vi\),\v 2 )(v 2 \/Tr(\v 2 )(v 2 \), -'' give the 
required decomposition into rank-one projectors. 

Next we consider the computational requirements for the LBF implementation described in Lemma. IIV.3I (for exact 
computation). First of all, we observe that the (worst-case) computational overhead for the LBF scales poorly with 
gate size: For a gate with support having dimension d , the total number of projectors of all ranks to which the LBF 
might be applied scales as 0(2 d ), a scaling which is doubly exponential with respect to the number of qudits. In the 
worst case, the LBF finds a local-basis solution for only one pair of input projectors, and the LBF must be applied 
to input projectors of all rank between 1 and [d/2] (note that II and 1 — II must share the same local basis). (In 
the simplest case, the LBF is applied first to all possible one-dimensional projectors and the output projectors are 
found to carry the same local-basis, in which case it not necessary to check higher-rank input projectors.) Hence when 
considering the computational complexity, we consider the dependence on circuit size for a fixed maximum gate size. 

Focusing now on the complexity for computations performed by the LBF given a specific choice of input projector, 
we can see that all steps involved can be performed efficiently. The key mathematical steps used by the LBF are: 
Gaussian elimination (and back substitution), for which complexity scales polynomially with respect to the matrices 
involved, and root finding, which is efficient due to the use of a bisection method. Furthermore, the majority of the 
calculations performed by the LBF use only integer matrices. Where irrational numbers do occur, however, there are 
large computational overheads due to the need to perform arithmetic operations using algebraic numbers with no loss 
of precision. The promise of rational concordant computation ensures that all irrational contributions must cancel for 
the output. Consequently, the computational cost for handling algebraic numbers can be regarded as a fixed overhead 
that does not undermine the efficiency of the LBF, for increasing numbers of gates. 

The computational requirements for the LBF are addressed by the following lemma: 

Lemma IV.4 The computational complexity to solve for local-basis updates following Lemma \lV.3\ scales, in regard 
of both time and space (memory), polynomially with respect to the number of circuit gates (for a fixed maximum gate 
size), and the total bits required to represent each gate and the initial state of each qudit. 

Proof See Appendix [Bj 


V. CONCORDANT COMPUTATION WITH GATES WHICH ARE CONSISTENT WITH 
INCOMPATIBLE CHOICES FOR THE LOCAL BASIS 

The LBF cannot be successfully applied in all cases: It is possible that the output of the LBF for a given unitary 
is not unique. In this case, the LBF generates incompatible multiple solutions (where the notation of compatibility is 
laid out in Sec. IIV AT) . This causes the simulation algorithm to fail, since an incorrect local basis may be chosen which 
would then cause the simulation algorithm to generate an entangling trajectory. A linear number of such events would 
lead to an exponential number of trajectories which would need to be tested, leading to an inefficient algorithm. Here, 
we explore some cases where this non-uniqueness arises. Note that the LBF will commonly output multiple outputs 
in ways which do not disrupt the algorithm, arising for example from reordering of the local-basis projectors. We are 
not interested in such cases here, since they are easy to identify and unproblematic, and we focus only on cases where 
the outputs are truly incompatible. 
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Where the LBF outputs such incompatible solutions, there is ambiguity for the corresponding local-basis update, 
and additional information is required to derive valid simulation trajectories. We have tested our LBF numerically, 
by applying it to gates of the form G = LDBL' where L, D , B and L' were generated randomly in keeping with the 
general form laid out in Lemma llV.il In our numerical studies, we found that the LBF did not output incompatible 
solutions in a large number of cases. However, we have also found special cases where the LBF outputs incompatible 
solutions for local-basis projectors depending on how one-dimensional projectors for the input are combined, which 
we now illustrate using examples. 

Our first special case is given by the gate G exc .1 which maps the computational basis to the basis of Bell states, 
G eX c.i : |J, k) i —y (g> X k ^ j . It is convenient to write the action of G exc .1 in the Pauli basis: 

Gexc.i |oo)(oo| g\ xc1 = i(i + i®i-y«y + z®z) 

Gexc.i 101X011 G\ xcl = i(l +X®X + Y®Y-Z®Z) 

Gexc.i | loxioi gL.i = + + 

Gexc.i |11)(11| gL.i = -X®X-Y®Y-Z®Z) (14) 

Noting that |00)(00| + |llXll| = \ (1 + ZZ), and that by local rotations from the Z-basis to the X and Y— bases also 

I++X+-H + |-X-1 = 5(1 + XX) and \+i + *)(+* + i\ + \-i - i){-i - i\ = \ (1 + YY), we can see that G eX c.i 

maps rank-two projectors in the computational basis to rank-two FRASEs carrying either the X, Y or Z basis for 
both qubits. (Note that G eX c.imust assign the same local basis both to a projector and the difference of that projector 
with the identity.) 

Gexc.i has a rather exceptional structure that exploits special features of the Bell states. In contrast, a generic class of 
gates which generate FRASE’s carrying incompatible local-bases is provided by controlled local unitaries. A simple ex¬ 
ample for two qubits would be gate G ex c .2 = cU which implements a rotation U on qubit 2 from the computation basis 
to any-other qubit basis, controlled by qubit 1. For the set of input FRASES {|00)(00|, |01)(01|, |1)(1| ® 1}, G exc . 2 out¬ 
puts FRASES with the computational basis for both qubits. For the set of input FRASES {|10)(10|, |11)(11|, |0)(0| g) 1 }, 
Gexc .2 outputs FRASES with the computational basis for qubit 1, and the rotated basis for qubit 2. Similar examples 
can be easily constructed for gates with support on arbitrary numbers of qudits, with arbitrary dimensions. 

Our LBF routine heralds the occurrence of incompatible local-basis solutions, but is forced to stop in such cases 
in the absence of additional information concerning the correct solution to choose. One possibility is to consider 
restricted instances of concordant computations which use only gates which do not generate incompatible solutions. 
When however it is necessary to consider gates which generate incompatible solutions, it is clear that efficient heuristic 
tests will suffice to resolve local-basis ambiguities in many cases. Another approach is to apply the LBF to extended 
sequences of gates with the aim of finding a unique local-basis update overall. The efficiency of our simulation proce¬ 
dure however is only preserved when the local-basis ambiguity extends over gate sequences which scale logarithmically 
with respect to the number of circuit gates. 


VI. CONCLUSIONS 

In this paper we contribute several new results on the problem of constructing efficient classical simulations for 
concordant computation. These new results include a method to solve for local-basis updates using exact arithmetic, 
which we prove is efficient. However, our results fall short of a proof that all instances of concordant computation 
admit efficient simulation or, on the contrary, that this is impossible in principle. The fundamental difficulty for any 
simulation of concordant computation is the need to test properties for the full quantum state. These tests generically 
involve exponenti ally -large matrices (other than in some special cases) and hence are inefficient. The simulation 
procedure of Ref. [ll]] involves symmetry tests on the full quantum state. It was proved by the author that these tests 
are computationally equivalent to solving 3-SAT, an NP-Complete problem, proving that the simulation cannot be 
efficient in general. 

In this paper we have progressed beyond the simulation procedure in Ref. a by attempting to bypass exhaus¬ 
tive symmetry testing on the quantum state. Our alternative simulation procedure attempts to derive simulation 
trajectories directly from the circuit which is supplied in the problem. This approach can work as the concordant 
promise heavily constrains the structure of the gates that make up the circuit. Consequently our analysis has focused 
on our LBF subroutine, which is for deriving local-basis updates directly from the gates and input local-basis pro¬ 
jectors. Our most important contribution is proof that local-basis updates can be computed efficiently using exact 
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arithmetic. Furthermore, our investigations have uncovered two classes of special gates for which the LBF outputs 
multiple incompatible choices for the local-basis updates. In such cases additional information about the quantum 
state is required to derive valid simulation trajectories. We leave the problem of characterizing the full set of special 
gates as an open challenge. It is also important to determine if these gates can be used to generate examples of 
concordant computation that cannot be reduced to those in the class of probabilistic reversible classical computation. 

An entirely different approach to overcoming the inefficient symmetry tests of Ref. [11] would be to replace NP- 
hard exact symmetry tests on the quantum state with efficient probabilistic sampling. Using ideas in Ref. j28[, it is 
possible to devise efficient symmetry tests based on random sampling, where the probability for error is exponentially 
suppressed. However, this approach gives rise to two challenges. The first is to understand the effects of errors within 
the simulation, which have been circumvented in this paper by using exact methods. The second is to understand 
cases when probabilistic methods fundamentally cannot work. Algorithms are highly structured by nature, and 
typically they are not well modelled as random processes. It is possible that there are scenarios involving concordant 
computation which provably require knowledge of rare hard instances to achieve valid output statistics, where the 
hard instances foil probabilistic tests on the quantum state. We leave a full analysis of these issues as an open problem. 
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Appendix A: Proof of Theorem IIII.4I 


Theorem IIII.4I Suppose that LqP 0 L 0 = 8^1 which is factorised and diagonal in the computa¬ 

tional basis, and that S 0 , is Clifford unitary on the N qubits. Then: ScL^pqLqS}, = L^poLg if and only if 


Tr 

Tr 


(s a Llp 0 L 0 Sl — lIpqLo^ Zti) ® lAi) = o Vj. The expectation values Tr (^S a LQp 0 L 0 Sl^ 8 fAt) 

Z« <g> lAU 


and 


can be computed efficiently, and the overall computational complexity for evaluating 


all the required expectation values scales quadratically with N. 


(s„Llp 0 L 0 Sl - LjpoAo) ZW ® 1^> 


Proof The forward direction is trivial. For the reverse, we assume that Tr 
0 Vj. It is necessary to establish that S a leaves all products of Z and 1 operators unchanged. (Trivally this holds 
for the identity.) Since LjpoTo is diagonal in the computational basis, for each qubit j which is pure, |0) (or 1 1 )), 


Tr [(. LipoLo ) Z«> (8) l ( \ j) 


are 1 (or — 1 ); furthermore, | 0 ) (or | 1 )) is the only 


and Tr Z® ® l^i) 

possible qubit state which gives 1 (or -1). Hence equality of S a Llp 0 L 0 Sl and L\p$L 0 is established for the pure 
qubits. For the remaining qubits L^^p^L^ = i and we denote the unique values for the qj by 

Qi,Q 2 , ■ ■ ■, where 1 > Q\ > Q 2 > Q 3 ■ • • > —1. We also denote corresponding subsets of qubits with q 7 = Q k by 
J{Qk)- The effect of S a on l} q pqLq in the Pauli basis is to permute the expansion coefficients for L^pqLq amongst 
the basis elements. 

We now consider basis elements corresponding to the largest expansion coefficient Q\ in L^pqLq and S a L' 0 poLoSl. 
For each j £ J(Qi), Tr [(4/Ooio) Z& ® = Tr Z^ ® => S a Z^ ® = ZW ® 

where qu S J(Qi). An analogous statement also holds when there are multiple Z operators for multiple qubits 
of J(Qr), and so S„ [® je j( Ql) | (l 0) + QiZ^)] ® l(/AQi)) 5 t = + Q X Z^ ® also. 

Proceeding now to Q 2 , a similar argument can be made as for Q\. The following now holds for each j £ J(Q 2 )'- 

S a Z W) <8 l^St = Z<® ® l ( \ fe) where q k £ 


Tr [(4 poT 0 ) ZW 8 


= Tr 


(s a Llp 0 L 0 Sl) Z«> 8 

J(Q 2 ): a possible ambiguity might be considered when the value of Q 2 coincides with a power of Q 1 , but in this 
case the corresponding basis element is already accounted for in the previous step. Hence it can be concluded that 


^• e 7(Q 2 )5 (a (j) +q 2 zw)] 8 u(/^))st = 


]L(MQ2)) The 

same argument can 


UeJ(Q 2 )5 ( l0) + QiZb)^ 

now be applied for Q 3 . Possible ambiguities which might be considered when the value of Q 3 coincides with a product 
of Qi’s and Q 2 ’s can be discounted, because the corresponding basis elements are accounted for previously. 
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By induction it follows that S^L^pqLqS^ = LqPqLq. Finally, all expectation values that must be computed for 
the theorem are of the form for a Pauli product operator acting on LqPqLq, each of these can be computed (via 
Gottesman-Knill theorem [ 3 } techniques) with linear complexity in N, there are 2N such expectation values to 
compute and therefore the overall scaling is quadratic. 


Appendix B: Proof of Lemma IIV.41 

Lemma IIV.4I The computational complexity to solve for local-basis updates following Lemma TlV. 3 1 scales, in regard 
of both time and space (memory), polynomially with respect to the number of circuit gates (for a fixed maximum 
gate size), and the total bits required to represent each gate and the initial state of each qudit. 

Proof We will consider an arbitrary time step within the simulation, for which the LBF is applied to gate G* and 
qudit j, with dimension dj. The dimensions of the support of Gt is d. Estimates for the computational requirements 
of the different types of mathematical steps involved are as follows: 

• Arithmetic on integers: The computation requirements for multiplication dominate over those for addition. 
The (time) complexity for multiplying two /-digit numbers scales as 0(l 2 ) and the output has 21 digits. For 
multiplication of matrices involving /-digit integer entries (for real and imaginary parts), the digit length for 
entries of the output matrix scales linearly with /, while the time complexity scales as third order in the matrix 
dimensions. 

• Gaussian elimination/back substitution: Gaussian elimination and back substitution are used by the LBF first 
to find a solution pW to Eq. m, and then to find eigenprojectors for p^ (given specific eigenvalues). The 
complexity for Gaussian elimination (which has cubic scaling with respect to the number of unknowns) dominates 
that for back substitution (for which the scaling is quadratic). The linear system H which must be solved to 
find pW is overdetermined, and requires Gaussian elimination on a matrix with dimensions 0(d 2 ) by 0(d 2 ). 
When Bareiss’s algorithm is used, the number of elementary steps is 0(d 2 dj) and the maximum number size is 
0{d : j (log d.j + /)), where / is the maximum number of digits for the matrix entries at the start (see Lecture 10 
of |22jp. Finding eigenprojectors for pvl requires Gaussian elimination to a matrix with dimensions dj by dj, 
which is integer for the case of an integer eigenvalue, but which has contributions which are algebraic numbers 
when the eigenvalue is an algebraic number. 

• Root finding for integer polynomials of degree p: Root finding must be used to find the eigenvalues of p^ (for 
which roots must be found for its characteristic equation of degree p = dj), and it is also used for arithmetic 
operations on algebraic numbers. Our method for root finding uses Sturm’s theorem (see Lecture 7 of Ref. 0 . 
Sturm’s theorem states that the existence of roots within a given interval can be detected by evaluating a Sturm 
chain of p + 1 polynomials at both interval endpoints, and taking the difference of the number of sign changes 
for the chain. By testing for the existence of roots within each interval, the search region can be repeatedly 
subdivided to locate the roots. If / is the maximum digit-length of the polynomial coefficients, then the initial 
search region can be taken to be of size 0(2 l ) (e.g. using Cauchy’s bound for polynomial roots). For integer 
solutions the smallest search interval has length 2, and the number of bisections required to locate a root is 
0(1). For irrational solutions, an additional running time 0(p 2 l 2 ) is sufficient to identify an isolating interval 
(see Lecture 6 of Ref. 0). 

• Arithmetic on algebraic numbers: Gaussian elimination and back substitution must be performed on matrices 
mixing integers with irrational algebraic numbers when the eigenvalues of p^ are irrational. One method for 
performing arithmetic on algebraic numbers is as follows: Every algebraic number can be represented as an 
integer polynomial which has the number as a root, together with an isolating interval (e.g. see Lecture 6 
of Ref. |22[). Arithmetic operations (i.e. addition, multiplication, number comparison, etc) can be done by 
perfor ming simple computations on companion matrices associated with the polynomials (described for example 
in Ref. [23]), together with updates to isolating intervals using the bisection method described in above. There 
is a large overhead for executing these arithmetic operations due to the need for Kronecker (tensor) product 
operations on the companion matrices. 
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